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We show how random matrix theory can be applied to develop new algorithms to extract dynamic 
£h 1 factors from macroeconomic time series. In particular, we consider a limit where the number of 

random variables N and the number of consecutive time measurements T are large but the ratio 
N/T is fixed. In this regime the underlying random matrices are asymptotically equivalent to Free 
Random Variables (FRV). Application of these methods for macroeconomic indicators for Poland 
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> ; I. INTRODUCTION 
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Macroeconomic time series are characterized by two main featuresxomovements and cyclic phases of expansion 
and depression. Behind the econometric literature, especially on business cycle and VAR modeling there is an idea, 
that essential characteristics of macroeconomic time series are adequately captured by a small number of nearly 
independent factors. 

On the other hand many empirical studies suggest to analyze many variables in order to understand the microeconomic 
mechanisms behind the fluctuations, where both N- the number of variables and T the length of a time series are 
typically large, and typically of the same order. Surprisingly most of the models developed in this field of science can 
precisely describe only static properties of the correlation structure. There is a strong need for models and algorithms, 
that can accurately capture the above described features of macroeconomic time series, are dynamic and allow for a 
\ reacher spatio - temporal structure. In this paper we present new algorithms, based on Random Matrix Theory 
that countenance extracting potentially useful factors in macroeconomic time series. In the first section we uncover 
external temporal correlation structure, using the assumption that the characteristic of each individual time series is 
meticulously depicted by the VARMA (qi, q^) structure with the same parameters. The second method encompasses 
exposition of internal correlation structure between two different sets of variables, after the external correlations are 
properly reduced. 



II. UNRAVELING LAGGED EXTERNAL TEMPORAL CORRELATIONS FROM VARMA(P,Q) 



Finite order vector autoregressive moving average models (VARMA) motivated by Wold decomposition theorem 
[l[ as an appriopriate multivariate setting for studying the dynamics of stationary time series. Vector autoregressive 
(VAR) models are cornerstones in contemporary macroeconomics, being a part of an approach called the "dynamic 
stochastic general equilibrium" (DSGE), which is superseding traditional large-scale macroeconometric forecasting 
methodologies ■ The motivation behind them is based on the assertion that more recent values of a variable are more 
likely to contain useful information about its future movements than the older ones. On the other hand, a standard 



'Electronic address: snarska@th.if.uj.edu.pl 



2 



tool in multivariate time series analysis is vector moving average (VMA) models, which is really a linear regression 
of the present value of the time series w.r.t. the past values of a white noise. A broader class of stochastic processes 
used in macroeconomics comprises both these kinds together in the form of vector autorcgrcssivc moving average 
(VARMA) models. These methodologies can capture certain spatial and temporal structures of multidimensional 
variables which are often neglected in practice; including them not only results in more accurate estimation, but also 
leads to models which are more interpretablc. 



A. Correlated Gaussian Random Variables 



We will consider a situation of N time-dependent random variables which are measured at T consecutive time 
moments (separated by some time interval 5t); let Yi a be the value od the i— th (i = 1,...,N) random number 
at the a-th time moment (a = 1, . . . , T); together, they make up a rectangular N x T matrix Y. Firstly we will 
assume, each Yi a is supposed to be drawn from a Gaussian probability distribution, and that they have mean values 
zero, (Yia) = 0. A set of correlated zero-mean Gaussian numbers is fully characterized by the two-point covariance 
function , Ci a jb = (Yi a Yjb) if the underlying stochastic process generating these numbers is stationary. The stationarity 
condition for VARMA ((ft, q-z) stochastic processes implies certain restrictions on their parameters; for details, we refer 
to 0] . We will restrict our attention to an even narrower class where the cross-correlations between different variables 
and the auto-correlations between different time moments are factorized, i.e., 

(Y ia Y jb ) = C l3 A ah . (1) 



1. Estimating External Cross - Covariances 

The realized cross-covariance between degrees i and j at the same time a is Yi a Yj a , the simplest method to 
estimate the today's cross-covariance dj is to compute the time average (named "Pearson estimator"), 



1 T 1 1 ~ ~ 

= f^YiaYj*, i.e., c = — YY T = — v / CYAY T v / C. (2) 

a=l 



The random matrix c is called a "doubly correlated Wishart ensemble" [4J . 

It is obvious, that our estimator will reflect the true covariances only to a certain degree, with a superimposed 
broadening due to the finiteness of the time series i.e., estimation accuracy will depend on the "rectangularity ratio," 

r-£ (3) 
the closer r to zero, the more truthful the estimate. Furthermore we will consider the situation, where 

N -> oo, T^oo, such that r = fixed. (4) 



B. Free Random Variables Calculus Crash Course 

1. The M -Transform and the Spectral Density 

Let's consider (real symmetric K x K) random matrix H . The average of its eigenvalues Ai, . . . , Xk is concisely 
encoded in the "mean spectral density," 



1 K 1 



K ^ x y " K 

i=i 
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On the practical side, it is more convenient to work with either of the two equivalent objects, 

G H (z) = l(Tr 1 V or M H (z) = zG H {z) - 1, (6) 
A \ zIk - H/ 

referred to as the "Green's function" (or the "resolvent") and the "M-transform" of H ("moments' generating 
function,"). The latter one serves better in the case of multiplication of random matrices. 



2. The N -Transform and Free Random Variables 



The doubly correlated Wishart ensemble c © may be viewed as a product of several random and non-random 
matrices. The general problem of multiplying random variables in classical probability theory can be effectively 
handled in the special situation when the random terms are independent: then, the exponential map reduces it to the 
addition problem of independent random numbers, solved by considering the logarithm of the characteristic functions 
of the respective PDFs, which proves to be additive. In matrix probability theory due to D. Voiculescu and coworkers 
and R. Speicher [1, we have a parallel commutative construction in the noncommutative world (c.f. table [J) . It 
starts with the notion of "freeness," which basically comprises probabilistic independence together with a lack of any 
directional correlation between two random matrices. This nontrivial new property happens to be the right extension 
of classical independence, as it allows for an efficient algorithm of multiplying free random variables (FRV), which we 
state below: 

Step 1: Suppose we have two random matrices, Hi and H2, mutually free. Their spectral properties are best wrought 
into the M-transforms ©, Mn^z) and Mn 2 (z). 

Step 2: The critical maneuver is to turn attention to the functional inverses of these M-transforms, the so-called 
"iV-transforms , " 

M H (Nu(z)) = N H (M H (z)) = z. (7) 

Step 3: The iV-transforms submit to a very straightforward rule upon multiplying free random matrices (the "FRV 
multiplication law" ) , 

N Hl n 2 (z) = -^—N Hl (z)N H2 (z), for free H 1; H 2 . (8) 
1 + z 

Step 4: Finally, it remains to functionally invert the resulting iV-transform 
Nh 1 h 2 ( z ) t° g am the M-transform of the product, Mu 1 u 2 ( z )- 



3. Extracting External Correlations 



The innate potential of the FRV multiplication algorithm ([5]) is surely revealed when inspecting the doubly 
correlated Wishart random matrix c = (1/T)VCYAY T \/C ((2|). This has been done in detail in 0, H|, so we will 
only accentuate the main results here, referring the reader to the original papers for an exact explanation. The idea 
is that one uses twice the cyclic property of the trace (which permits cyclic shifts in the order of the terms), and 
twice the FRV multiplication law ([8]) (to break the ^-transforms of products of matrices^ down to their constituents), 
in order to reduce the problem to solving the uncorrelated Wishart ensemble (1/T)Y T Y. This last model is further 
simplified, again by the cyclic property and the FRV multiplication rule applied once, to the standard GOE random 
matrix squared (and the projector P = diag(ljv, Ot-n), designed to chip the rectangle Y off the square GOE), whose 
properties are firmly established. Let us sketch the derivation, 

cyclic FRV cyclic 
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Classical Probability 

x - random variable, p(x) 
pdf 

characteristic function g x (z) = (e tzx ) 



Noncommutative probability (FRV) 

H - random matrix, P(H) 
spectral density g(\)d\ 
Green's function Gh(z) = (Tr 

or M - transform M(z) = zGh(z) — 1 
independence freeness 
Addition of independent r.v.: Addition of f.r.v. 

The logarithm of the characteristic function, The Blue's function 

r x (z) = \ogg x (z), is additive, Gh(Bh(z)) = Bh(Gh(z)) = z, is additive 

r xl+X2 (z) = r xi (z) + r X2 (z) B Hi +h 2 {z) = B Hl {z) + Bh 2 (z) - - 



Multiplication of independent r.v.: 
Reduced to the addition problem 
via the exponential map, owing to 



Multiplication of free r.v.: 
The N - transform, 
M H (N H (z)) = N H (M„(z)) = z, 

is multiplicative 
N Hi h 2 (z) = ^-N Hl {z)N H2 (z) 



TABLE I: Parallel between Classical and Noncommutative probability 



cyclic FRV 

Z Z T Z 

' : N ^y. A {rz)N c (z) == 7—7- Ni^{rz)N A {rz)N c (z) 



1 + Z T 



1 + z 1 + rz t 
rzN A (rz)N c (z). 



(9) 



This is the basic formula. Since the spectral properties of c are given by its M-transform, M = M c (z), it is more 
pedagogical to recast (|9]) as an equation for the unknown M, 



z = rMN A (rM)N c (M). 



(10) 



It provides a means for computing the mean spectral density of a doubly correlated Wishart random matrix once the 
"true" covariance matrices C and A are given. 

In this communication, only a particular instance of this fundamental formula is applied, namely with an arbitrary 
auto-covariance matrix A, but with trivial cross-covariances, C = ljv- Using that N\ K {z) = 1 + l/z, equation (|10[) 
thins out to 



rM = M A 



r(l + M) 



(11) 



which will be strongly exploited below. 

For all this, we must in particular take both N and T large from the start, with their ratio r = N/T fixed ((H). 
More precisely, we stretch the range of the a-index from minus to plus infinity. This means that all the finite-size 
effects (appearing at the ends of the time series) are readily disregarded. In particular, there is no need to care about 
initial conditions for the processes, and all the recurrence relations are assumed to continue to the infinite past. 



C. The VARMA(gi,g 2 ) Process 
1. The Definition of VARMA(qi , q 2 ) 

One can define stochastic proces called VARMA(gi, (72) hi the following way as a convolution of VAR(g) and 
VMA(g) processes: 



0=1 



(12) 



Q = 
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To be more specific, we consider a situation when N stochastic variables evolve according to identical independent 
VARMA((7i, ^(vector autoregressive moving average processes, , which we sample over a time span of T moments. 

VMA(g) (vector moving average) process is a simple generalization of the standard univariate weak-stationary 
moving average MA(g). In such a setting, the value Yi a of the i— th [i = 1, . . . , N) random variable at time moment a 
(a = 1, . . . , T) can be expressed as 

9 

^ ia ^ ^ ^a^i.a — a- (1*^) 

Here all the e^s are IID standard (mean zero, variance one) Gaussian random numbers (white noise), {eta^jb) = &ij&ab- 
The a a 's are some (q + 1) real constants; importantly, they do not depend on the index i, which reflects the fact that 
the processes are identical and independent (no "spatial" covariances among the variables). The rank q of the process 
is a positive integer. 

The VAR(g) (vector auto-regressive) processes part is somewhat akin to (flU]) . i.e., we consider N decoupled copies 
of a standard univariate AR(g) process, 

9 

Yia ~ X! b P Y i,a~P = a 0tia- (14) 

0=1 

It is again described by the demeaned and standardized Gaussian white noise 6i a (which triggers the stochastic 
evolution), as well as (q + 1) real constants ao, bp, with (3 = 1, . . . ,q. As announced before, the time stretches to 
the past infinity, so no initial condition is necessary. Although at first sight (|14[) may appear to be a more involved 
recurrence relation for the Yi a 's, it is actually easily reduced to the VMA(q) case: It remains to remark that if 
one exchanges the lj a 's with the ej a 's, one precisely arrives at the VMA(g) process with the constants = 1/ao, 
a? = —bp/ao, ft = 1, . . .,q. In other words, the auto-covariance matrix A^ 3 ^ of the VAR(g) process (|14[) is simply 
the inverse of the auto-covariance matrix A^ 2 ) of the corresponding VMA(g) process with the described modification 
of the parameters, 



A (3) = U(2)\ (15) 



This inverse exists thanks to the weak stationarity supposition. 

The auto-covariance matrix A*- 5 -* of this process is simply the product (in any order) of the auto-covariance 
matrices of the VAR and VMA pieces; more precisely, 

A< 5 ) = fAW) _1 AW, (16) 



where A^ corresponds to the generic VMA(q2) model, while A^ 4 ) denotes the auto-covariance matrix of VMA(gi) 
with a slightly different modification of the parameters compared to the previously used, namely Og = 1, a^ 4 ' = — bp, 
for /3 = l,...,q 1 . 

The Af -transform of A^ 5 ' can consequently be derived from the general formulas 0. We will evaluate here the 
pertinent integral only for the simplest VARMA(1, 1) process, even though an arbitrary case may be handled by the 
technique of residues, 



z (a ai + (ag + a 2 ) h + a oai b\) 
M A (5)(z) = — — p-j— - | -a ai H ; v / = | . (17) 



aoai+5lZ V ^(1 - & i) 2 z-(ao + ai) V(l + hf z - (a - a,) 



D. Example from Macroeconomic Data 



We apply the method described above to Polish macroeconomic data. The motivation behind is twofold. First 
of all economic theory rarely has any sharp implications about the short-run dynamics of economic variables (so 
called scarcity of economic time series). Secondly in these very rare situations, where theoretical models include a 
dynamic adjustment equation, one has to work hard to exclude the moving average terms from appearing in the 
implied dynamics of the variables of interest. 
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1. An Application to Macroeconomic Data 

Let us pursue further the above analysis of the VARMA(1, 1) model on a concrete example of real data. Economists 
naturally think of co-movement in economic time series as arising largely from relatively few key economic factors 
like productivity monetary policy and so forth. Classical way of representing this notion is in terms of statistical 
factor model, for which we allow the limited spatio-temporal dependence expressed via our VARMA(1, 1) model. Two 
empirical questions are addressed in this section: 

1. How the empirical data behave? Does the eigenvalues represent similar structure to financial market data? 
In other words, does the macroeconomic data represent collective response to the shock expressed in term of 
VARMA(1, 1) model or are there any apparent outliers. 

2. Should then the forecasts be constructed using small factor models or are there any non-zero, but perhaps small 
coefficients. If so, then the large scale model framework is appropriate. 

We investigate N = 52 various macroeconomic time series for Poland of length T = 118. They have been selected on a 
monthly basis in such a manner so as to cover most of the main sectors of the Polish economy, i.e., the money market, 
domestic and foreign trade, labor market, balance of payments, inflation in different sectors, etc. The time series 
were taken directly from the Reuters©3000Xtra database. Although longer time series for Poland are accessible, 
we restrict ourselves to the last ten years in order to avoid the effects of structural change. We assume that each 
economic variable is affected by the same shock (i.e., the "global market shock") of an ARMA(1, 1) type with unknown 
parameters, which we are to estimate; the AR part implies that the shock dies quite quickly, while the MA part is 
responsible for the persistency of the shock. To preserve the proper VARMA representation, the original time scries 
were transformed using one of the following methods: 

• First, many of the series are seasonally adjusted by the reporting agency. 

• Second, the data were transformed to eliminate trends and obvious nonstationarities. For real variables, this 
typically involved transformation to growth rates (the first difference of logarithms) , and for prices this involved 
transformation to changes in growth rates (the second difference of logarithms). 

• Interest rates were transformed to first differences. 

• Finally, some of the series contained a few large outliers associated with events like labor disputes, other extreme 
events, or with data problems of various sorts. These outliers were identified as observations that differed from 
the sample median by more than 6 times the sample interquartile range, and these observations were dropped 
from the analysis. 

In fig. [1] we plot these time series (LEFT), and we make (RIGHT) a histogram of the mean spectral density p c (A), 
which we compare to a theoretical prediction solving the FRV equation for (jTTJ) with the estimated values of the 
parameters do, ai, b\. We have also plotted the standard Marcenko-Pastur (Bai-Silverstein) [jjl lioj 



2. Discussion 

The result is important for forecast design, but more importantly, it provides information about the way macroe- 
conomic variable interact. The empirical data as compared to the spectral density equation suggest that a lot of 
eigenvalues, similarly to stock market data express marginal predictive content. One can suppose, that each of the 
economic time series contains important information about the collective movements, that cannot be gleaned from 
other time series. Alternatively, if we suppose that macro variables interact in the simplest low-dimensional way 
suggested by VARMA(1, 1) model, the conformity is nearly ideal (modulo the finite-size effects at the right edge of 
the spectrum). The economic time series express common response to the "global shock" process i.e., each eigenvalue 
now contains useful information about the values of the factors, that affect co-movement and hence useful information 
about the future behavior of economy. Thus, while many more eigenvalues appear to be useful, the predictive com- 
ponent is apparently common to many series in a way suggested by our simplified VARMA(1, 1) model. The above 
analysis on a real - complex systems example - i.e. Economy of Poland, for which we have assumed, that each of the 
time series under study is generated by the same type of univariate VARMA(gi, g^) process reveals a stunning fact, 



7 



Macroeconomic time series for Poland 




2.20174 ±0.16702 
0.10671 ±0.06092 
0.51852 ±0.07015 




Results for VARMA(1 .1 ) model 

Chi A 2/OoF = 0.00298 
R A 2 = 0.90678 
sQ 101034 
a1 -0.22006 ±0.19862 
b1 0.62501 +022171 



40 60 
number of observations 



eigenvalues a 



FIG. 1: LEFT: The original N — 52 time series of length T = 118; they are non-stationary, with the seasonal components. 
RIGHT: The histogram of the mean spectral density pc(A) (the solid black line) compared to the theoretical result obtained 
by numerically solving the resulting sixth-order equation (I17[) (the solid orange line) and the " Wishart-fit" (purple line). 



that again the flawless correspondence between theoretical spectral density and empirical data is found. We are in 
the position, where we cannot reject the hypothesis, there are indeed no autocorrelations among macroeconomic time 
series. One may also argue, that all these time series are closely bounded with the process, which we will identify as 
"global shock-process" i.e., all time series represent the global response complex system under study, to a distortion 
and its adjustment to equilibrium state. This process is of univariate VARMA(q 1 , q 2 ) i.e., ARMA(1, 1) type with 
hidden structure, that has to be revealed based on historical time series data. This crude empirical study allows 
potentially for variety of extensions. At least two are possible. First, the approach used here identifies the underlying 
factors only up to a linear transformation, making economic interpretation of the factors themselves difficult. It would 
be interesting to be able to relate the factors more directly to fundamental economic forces in the spirit of DSGE 
models. Secondly, our theoretical result covers only stationary models, but say nothing about integrated, cointegrated 
and co-trending variables. We know that common long-run factors are important for describing macroeconomic data, 
and theory needs to be developed to handle these features in a large model framework. 



III. UNRAVELING INTERNAL TEMPORAL CORRELATIONS VIA SVD TECHNIQUE 



In order to investigate the temporal properties of the internal correlations between two data sets one is often 
interested not only in the analysis of it's static properties given by Pearson estimator ([2]). C \- = tjtXX t but more 
likely how this features behave over a certain period of time. Again the primary way to describe cross-correlations in 
a Gaussian framework is through the two-point covariance function i|II A|) . 

Cia,jb = (Xi a Xjb) . (18) 

Where Xi a = Xi a — (xi a ) are mean adjusted data, that can be further collected into a rectangular N x T matrix 
R. The average (. . .) is understood as taken according to some probability distribution whose functional shape is 
stable over time, but whose parameters may be time dependent. In previous sections we have used a very simplified 
form of the two-point covariance function (jll Aj) . namely with cross-covariances and auto-covariances factorized and 
non-random ([1]) , 

Ci a jb = CijA a b (19) 

(we have assembled coefficients into an N x N cross-covariance matrix C and a T x T auto-covariance matrix A; both 
are taken symmetric and positive-definite). The matrix of "temporal structure" A is a way to model two temporal 
effects: the (weak, short-memory) lagged correlations between the returns , as well as the (stronger, long-memory) 
lagged correlations between the volatilities (weighting schemes, eg.EWMA [HI). On the other hand, the matrix of 
spatial correlations C models the hidden factors affecting the variables, thereby reflecting the structure of mutual 
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dependencies of the complex system. The salient feature assumed so far, these two matrices were decoupled and the 
assumption about the Gaussianity of random variables provides crude approximation, that variances of all random 
variables always exist. This was sufficient to fully characterize the dependencies of the XiaS. However, in more 
realistic circumstances (i.e., building efficient multivariate models, which help understanding the relation between a 
large number of possible causes and resulting effects) one is more interested in the situations, where the spatio- 
temporal structure does not factorize. Cross-correlations technique (sometimes alluded as "time-lagged correlations 
technique" ) is most likely meets these critical requirements. 

1 T 

Cjg, ja+A (A) = — Xi a Xj a+ A (20) 
a=l 

The precise answer boils down to how to separate the spectrum of such a covariance matrix in the large TV, large T 
limit (i.e., thermodynamical limit), when one can make use of the power of FRV calculus (see [ll] for a solution based 
on the circular symmetry of the problem and Gaussian approximation). In this chapter we will very closely follow 
the method presented in [l3| . where the authors suggested to compare the singular value spectrum of the empirical 
rectangular M x N correlation matrix with a benchmark obtained using Random Matrix Theory results (c.f. [H|), 
assuming there are no correlation between the variables. For T — > oo at N, M fixed, all singular values should be 
zero, but this will not be true if T is finite. The singular value spectrum of this benchmark problem can in fact be 
computed exactly in the limit where TV, M, T — > oo, when the ratios m = M/T and n = N/T fixed. Since the original 
description is veiled, for pedagogical purposes we rederive all these results in the language of FRV presented in the 
table U Furthermore we extend the results obtained in [l5| to meet encounter time-lagged correlations. 



A. Mathematical Formulation of a Problem 

Due to the works it is believed that, the system itself should determine the number of relevant input and 

output factors. In the simplest approach one would take all the possible input and output factors and systematically 
correlate them, hoping to unravel the hidden structure. This procedure swiftly blow up with just few variables. The 
cross - equation correlation matrix contains all the information about contemporaneous correlation in a Vector model 
and may be its greatest strength and its greatest asset. Since no questionable a priori assumptions are imposed, fitting 
a Vector model allows data-set to speak for itself i.e., find the relevant number of factors. Still without imposing any 
restrictions on the structure of the correlation matrix one cannot make a causal interpretation of the results. The 
theoretical study of high dimensional factor models is indeed actively pursued in literature (l8Tl25| . The main aim of 
this chapter is to present a method, which helps extract highly non-trivial spatio-temporal correlations between two 
samples of non-equal size (i.e. input and output variables of large dimensionality) , for these can be then treated as 
"natural" restrictions for the correlations matrix structure. 



1. Basic framework and notation 

We will divide all variables into two subsets i.e., focus on TV input factors X a (a — 1, . . . , N) and M output factors 
V Q (a = 1, . . . , M) with the total number of observations being T . All time series are standardized to have zero mean 
and unit variance. The data can be completely different or be the same variables but observed at different times. 
First one has to remove potential correlations inside each subset, otherwise it may interfere with the out-of-samplc 
signal. To remove the correlations inside each sample we form two correlation matrices, which contain information 
about in-thc-sample correlations. 

c x = \xx T c Y = iyv T (21) 

The matrices are then diagonalized, provided T > N,M , and the empirical spectrum is compared to the theoretical 
Marcenko-Pastur spectrum [l(| [26l428j in order to unravel statistically significant factors. The eigenvalues, which lie 
much below the lower edge of the Marcenko-Pastur spectrum represent the redundant factors, rejected by the system, 
so one can exclude them from further study and in this manner reduce somewhat the dimensionality of the problem, 
by removing possibly spurious correlations. Having found all eigenvectors and eigenvalues, one can then construct a 
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set of uncorrelated unit variance input variables X and output variables Y . 

*~ - jk vTx - *~ - vm"^ (22) 

where V,U, A Q , X a are the corresponding eigenvectors and eigenvalues of Cx , Cy respectively. It is obvious, that 
C x = XX T and C Y — YY T are identity matrices, of dimension, respectively, N and M. Using general property 
of diagonalization, this means that the T x T matrices Dj> = X T X and D y = Y T Y have exactly N (resp. M) 
eigenvalues equal to 1 and T — N (resp. T — M) equal to zero. These non-zero eigenvalues are randomly arranged on 
a diagonal. Finally we can reproduce the asymmetric M x N cross-correlation matrix G between the Y and X: 

G = YX T (23) 

which includes only the correlations between input and output factors. In general the spectrum of such a matrix is 
complex, but we will use the singular value decomposition (SVD) technique (c.f. (29| ) to find the empirical spectrum 
of eigenvalues. 



2. The Singular Value Decomposition 



The singular value spectrum represent the strength of cross-correlations between input and output factors. Sup- 
pose G is an M x N matrix whose entries are either real or complex numbers. Then there exists a factorization of 
the form 

G = C/E0 (24) 

where U is an M x M unitary matrix. The columns of U form a set of orthonormal "output" basis vector directions 
for G - these are the eigenvectors of G^G. E is M x N diagonal matrix with nonnegative real numbers on the 
diagonal, which can be thought of as scalar " gain controls" by which each corresponding input is multiplied to give a 
corresponding output. These are the square roots of the eigenvalues of GG^ and G^G that correspond with the same 
columns in U and V. and denotes the conjugate transpose of V, an N x N unitary matrix, whose columns form 
a set of orthonormal "input" or vector directions for G. These are the eigenvectors of GG^ . A common convention 
for the SVD decomposition is to order the diagonal entries Ej^ in descending order. In this case, the diagonal matrix 
E is uniquely determined by G (though the matrices U and V are not). The diagonal entries of E are known as the 
singular values of G. 



B. Singular values from free random matrix theory 



In order to evaluate these singular eigenvalues, assume without loss of generality M < N. The trick is to consider 
the matrix M x M matrix GG T (or the N x N matrix G T G if M > N), which is symmetrical and has M positive 
eigenvalues, each of which being equal to the square of a singular value of G itself. Furthermore use the cyclic 
properties of the trace. Then non-zero eigenvalues of 

GG T = YX T XY T 

are then the same (up to the zero modes) as those of the T x T matrix 

D = D ± D Y = X T XY T Y 

obtained by swapping the position of Y from first to last. In the limit N, M, T — > oo where the X's and the Y's are 
independent from each other, the two matrices and D y arc mutually free [30| . and we can use the results from 
FRV, where given the spectral density of each individual matrix, one is able to construct the spectrum of the product 
or sum of them. 
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1. FRV Algorithm for Cross-correlation matrix 



As usual we will start with constructing the Green's function for matrices and Dy. Each of these matrices, 
have off-diagonal elements equal to zero, while on diagonal a set of M (or N respectively) randomly distributed 
eigenvalues equal to 1 



n — 1 / M _i_ T — M \ _ m , 1-m M. 

<^D ir — — I — ~r H ; — I — — ~r H — TTl 



T U-l z J z-1 T z T ' 
1 ( N , T-N \ _ n i 1-n _ _ iV . 



Then: 



or equivalcntly 



where: 



From this, one easily obtains: 



N D . (z)m 

n c f(»)-i +l-m-l = 2 



and one readily gets the iV-transform for the matrix D-^ • D 



(25) 



S D .. D .(z) = S D Jz)-S D Jz) (26) 



±±ZN Djt . D r(z)=N Dt (z)-N D9 (z), (27) 



S*(z) = — X*(z) N x (z) = — ^ N x (z)G (N x (z)) -1 = 2 



(28) 



N Dt (z) = Nz, . (z) = (29) 



^. D Az)= im + Z) i n + Z) 00) 



Y 

_ (m + z)(n + z) 

N »*-M*)- Z( i + Z ) ( 31 ) 

Inverting functionally (|31l) 

N Dl . D ,^)G D (% rD .(^)) =z + l (32) 
i.e., solving the second order equation in z, one is able to find the Green's function of a product ■ D Y 

= z 2 (l - N(z)) + (n + m - N(z))z + mn 
r(N( \\ — 2 -("+"+ jV ( z ))-y / ("+ m - jV (- g )) 2 - 4 ( 1 - jV ( z ))"^ 

U(1\{Z)) — 2N(z)(l-N(z)) ' 

where we have omitted the subscripts for brevity. Subsequently mean spectral density is obtained from the standard 
relation 

lim — !— = PV ( i ) - iw5(\) => p D (A) = --ImGWA + ie). (34) 
e^o+ A + it \XJ 7r 

The final result i.e., the benchmark case where all (standardized) variables X and Y are uncorrelated, meaning that 
the ensemble average E(Cx) = E{XX T ) and E(Cy) = E(YY T ) are equal to the unit matrix, whereas the ensemble 
average cross-correlation E(G) = E(YX T ) is identically zero, reads as in original paper (l3| : 

Pd(A) = max(l — n, 1 — m)S(X) + max(m + n — 1, 0)5(A — 1) + 
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FIG. 2: Simulation of a continuous part of the theoretical random singular value spectrum p(A) for different values of n and 
m.It is obvious to see that A+ < 1 for all values of n, m < 1. The upper bound is reached only when n + m = 1, in which 
case the upper edge of the singular value band touches A = 1 i.e., for n = m the spectrum extends down to A = 0, whereas for 
n + m — > 1, the spectrum develops a (1 — A) -1 '' 2 singularity, just before the appearance of a S peak at A = 1 of weight n + m — 1. 



R Cv /(A 2 -s-)(A+ -s 2 ) 
ttA(1-A 2 ) 



(35) 



where s± = n + m — 2mn ± 2y/mn(l — n)(l — to) are the two positive roots of the quadratic expression under the 
square root It is easy to discover the fact, that in the limit T — > oo at fixed TV, M, all singular values collapse to zero, 
as they should since there is no true correlations between X and Y; the allowed band in the limit n, m — > becomes: 



(36) 



When tl — y ttl, the support becomes A S [0, 2y/ m(l — to)] (plus a 5 function at A = 1 when n + m > 1), while when 
m = 1, the whole band collapses to a S function at A = yl — n. For n + m — > 1 _ there is an initial singularity of 
p(A) A = 1 diverging as (1 — A) -1 / 2 . Ultimately m — > at fixed n, one finds that the whole band collapses again to a 
5 function at A — \fn. 



2. SVD cleaning technique and the MP case 



The results from the previous section were obtained under belief there were no correlations between input and 
output samples of infinite sizes. However, for a given finite size sample, the eigenvalues of Cx and Cy will differ 
from unit, and the singular values of G will not be zero and instead cross-correlations between input and output 
variables are involved. The SVD spectrum in that case is the convolution of two Marcenko-Pastur [l(| distributions 
with parameters m and n, respectively, which reads, for r = n, m < 1: 



Pmp(X) = 2^ x Rc v /(A-A_)(A + -A) 

with X± = (1 ± \fr) 2 The /V-transform of this density takes a particularly simple form (cf. 

l + z 



(37) 



for an exact derivation) 



Nmp(z) = 



1 + rz 



(38) 
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The singular values of G are obtained as the square-root of the eigenvalues of D = X T XY T Y . Under assumption, 
that X T X and Y T Y are mutually free, after having noted that the ./V-transform of the T x T matrices X T X and 
Y T Y are now given by: 

N(,) = + + ^ (39) 
rz 

one can again use the multiplication rule of iV-transforms and finds the Green's function of D by solving the following 
cubic equation for z: 

(1 + z)(l + nz)(l + mz)N(z) - mnz = (40) 

which with little effort can be solved analytically. Then Green's function is readily obtained by inserting the solution 
of the eq.flU) 

G(N(z)) = Z(N ff z } } + 1 P(A 2 ) = -hmG(X + ie) (41) 
This will lead to a rather complicated form of the final function 

/ o \ V 3 9-1/2 . . 

'< a >=(wi) V( 2 " ,3+ ^ 2 >) < 42 > 

where 

(/s(A 2 ) = 2 - 3m(l - m) - 3n(l - n) - imn(n + m - 4) + 2(m 3 + n 3 ) + 9A 2 (1 + m + n) 



6>(A 2 ) = cp(A 2 ) - Vv(A 2 ) - 4(1 + m 2 + n 2 - mn - m - n + 3A 2 ) 3 



C. Example from the Data 

The last decade has been a witness of an enormous progress in the development of small-scale macroeconomic 
models. It's not too much an overstatement to say, that the statistical analysis of VAR models, Kalman filter models 
etc. is nowadays complete. The major issue with these models is that they can accurately approximate small number 
of time series only. On the other hand Central Banks must construct their forecasts in rich data environment (3l| . This 
mismatch between standard macroeconomctric models and real world practice has led to unfortunate consequences. 
Forecasters have had to rely on informal methods to distill information from the available data, and their published 
forecasts reflect considerable judgement in place of formal statistical analysis. Forecasts are impossible to reproduce, 
and this makes economic forecasting a largely non-scientific activity i.e., formal small-scale models have little effect 
on day-to-day policy decisions, making these decisions more ad hoc and less predicable than if guided by the kind of 
empirical analysis that follows from careful statistical modeling. The goal of this research is to use the wide range 
of economic variables that practical forecasters and macroeconomic policymakers have found useful, and establish 
a direction that explicitly incorporates information from a large number of macroeconomic variables into formal 
statistical models. We have focused on two different data sets, namely Polish macroeconomic data and generated set 
of data, where temporal cross - correlations are introduced by definition. The full data set is the same as it was used 
in previous chapter. 



1. Polish Macroeconomic data revisited 

Poland is and interesting emerging market with unique social and business activity in the process of rapid growth 
and industrialization. Wc hope our analysis might be helpful in understanding the factors that helped Poland to 
survive during the 2008 crisis. The main problem to be solved is to choose the correct variables to include. This is the 
familiar problem of variable selection in regression analysis. Economic theory is of some help, but usually suggests 
large categories of variables (money, interest rates, wages, stock prices, etc.) and the choice of a specific subset of 
variables then becomes an open problem. The analysis began with checking, whether the method described in [l3T ] is 
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relevant for describing the relation between the inflation indexes for Polish macroeconomic indexes and other Polish 
macroeconomic data published by different government and non-government agencies. A consumer price index (CPI) 
is a measure estimating the average price of consumer goods and services purchased by households. A consumer price 
index measures a price change for a constant market basket of goods and services from one period to the next within 
the same area (city, region, or nation). It is a price index determined by measuring the price of a standard group of 
goods meant to represent the typical market basket of a typical urban consumer. The percent change in the CPI is 
a measure estimating inflation. It is commonly viewed as the indicator not only the measure of inflation, but rather 
the indicates the change of costs of maintenance. The data set represent a wide range of macroeconomic activity and 
were initially transformed to ensure stationarity and diminish the effects of seasonal components. The same data set 
we have already analyzed in the first part of the paper and the detailed list of all time series can be obtained from 
the author upon request. This time, the whole set of 52 time series, observed on a monthly basis between Jan — 2000 
and Oct — 2009 (T = 118) was divided into two subsets i.e., 

• We have used monthly M = 15 changes of different CPI indicators as our predicted variables (i.e. output sample 
Y) 

• The input sample X consisted of N = 37 monthly changes of economic indicators (eg. sectoral employment, 
foreign exchange reserves, PPI's) as explanatory variables. 

The data were standardized and mean adjusted, but following the general idea of fl3j the input and output samples' 
factors were not selected very carefully, so the data could speak for themselves and system could be able to select 
the optimal combination of variables. The resulting diagrams (sec Figj3j) now demonstrate, that even standardized 




5 10 15 20 25 30 35 2 4 6 8 10 12 14 



FIG. 3: Correlation matrices representing generic in-the-sample correlations. The data were mean adjusted and standardized. 
In a perfect situation, one is expecting that cross-correlations tends to zero, however still nontrivial correlations are present. 
LEFT:Matrix with 37 input variables X. RIGHT: Matrix with 15 input variables Y - components of CPI. 

and converted to stationary time series may represent nontrivial in-the-sample correlations. Short - term economic 
forecasts build from these type data in consequence may be poor and show no sign of improving over time. The 
next step involved cleaning internal correlations in each sample. To do it, we have used equation (|2"Tj) . The effective 
matrices were then diagonalizcd and two sets of internally uncorrelatcd data were prepared. 



2. Results for Equal-time spectra 

From the uncorrelated data we create the rectangular matrix G and diagonalizc it to calculate singular eigen- 
values. Finally we have used the benchmark calculated in equation (|22[) to compare the data with the predicted 
eigenvalue density. For the same data sets we have also created the set of correlated samples i.e., the set, where 
internal spatial cross-correlations were not a-priori removed (sec Fig. [4]). Apparently there is enough idiosyncratic 
variation in standard activity measures like the unemployment rate and capacity utilization, that removing noisy 
components from these might provide a clearer picture of factors affecting inflation. We have excluded from further 
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analysis series responsible for reference NBP bill rate balance of payments, and from the set of explanatory variables 
ordinary measures of inflation - CPI in food sector, beverages and tobacco ans services. This approach allows us to 



- singular values 

- benchmark 





1 1.5 
Singular values 



5 10 15 20 25 30 35 40 45 





FIG. 4: Comparison of cleaning techniques for equal-time SVD spectra. ABOVE: Cumulative singular density and "heat 
map" for the cleaned problem. BELOW LEFT: Empirical density of eigenvalues in the MP 2 framework.BELOW RIGHT: 
Benchmark calculated according to MP 2 spectrum. 



directly reproduce temporal cross-correlations. 

The lack of symmetry condition endure us to focus only on out-of-the-sample correlations without mixing them with 
inner ones and to study temporal properties of such matrix. The results show, that there exists some singular eigen- 
values, which do not fit the benchmark. Among them, the highest singular eigenvalue s\ = 2.5 and the corresponding 
singular eigenvector, represent standard negative correlation between expenses for electricity and net balance of pay- 
ments in the energy and positive correlation between CPI in health sector and unemployment. In our approach we 
can not only observe this relations, but also interpret them in terms of causality. That is, larger unemployment rate 
causes increase in the CPI. There are other non-trivial relations between eg. CPI in telecommunication sector and 
foreign exchange reserves. All of these correlations are however well known from textbooks or can be easily explained 
by means of classical economic theory. When some of the eigenvalues become strongly related, zero modes emerge 
- clearly the majority (around 80% of all eigenvalues are concentrated close to zero, meaning that there are strong 
spatial correlations inside X and Y data set. If we are to use the MP 2 benchmark then it is clear, that empirical 
spectrum is affected by idiosyncratic components, again confirming, that spatial structure strongly interferes with 
temporal, and it is crucial to "remove" redundant factors to avoid spurious (confunding) correlations. 
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3. Solution for lagged spectra 

A natural way to examine macroeconomic data is via factor models. In previous section we have assumed that the 
inflation can be accurately explained by the factor model using relatively small number of latent variables. Pushing 
the factor model one step further, these latent factors might also explain the predictive relationship between current 
values of variables and those of the previous month. The next step of our study involved shifting the input and output 
data set by one observation (one month). The Y were calculate from t = 2,..., 118 and X's for (f = 1, . . . , 117). We 
were motivated by the common belief, that it is the "yesterday" shock, that affects "today" change i.e., there is some 
persistency and long memory within this system. This is the same approach, that underlies the idea of VARMA(gi, 52) 
processes [3[ . The temporal structure (Fig[5]) manifests itself via the existence of significant non-symmetric relation 



Data vs Benchmark Heat map for reduced G 




0.2 0.4 0.6 0.8 1 5 10 15 20 25 30 

s 32 explanatory variables X(t— 1 ) 



FIG. 5: Results for lagged-by-one-month spectra.LEFT: There is a bunch of singular values that do not fit the benchmark. 
The largest s w 0.77 represents the same correlation structure as within unshifted framework. RIGHT: Note, that now we can 
see whole bunch of islands of non-trivial factors, that affect CPI's. 

(represented by one singular eigenvalue, that does not fit the benchmark) between data sets X and Y, that are 
shifted by one month. It is easy to notice, that only few factors are responsible for the model's performance. CPI in 



Y 


X 


Type of correlation 


CPI in communication sector 


completed dwellings 

net balance of payments of goods 

M3 money aggregate 

Employment in manufacturing sector 


negative 


employment in enterprize sector 

Direct investments 

Foreign exchange reserves 

Official reserve assets 

New heavy trucks registration 

Balance of payments - services 


positive 


CPI in clothing sector 


Total export 


negative 


CPI in restaurants and hotels 
sector 


Foreign exchange reserves 


positive 


CPI in transport sector 


Foreign exchange reserves 


positive 


Total production in manufacturing sector 
Total export 


negative 



TABLE II: Factors affecting the set of output variables for lagged spectra, 
telecommunication sector is affected by the largest number of possible explanatory variables (c.f. Table HI)) . Among 
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them the most unexpected is the correlation with heavy trucks. Two or three factors are useful for some categories 
of series, but only a single factor is responsible for the predictability of prices in all sectors. Apparently, the first 
factor is foreign exchange reserves level, and the results say that it is an important predictor of future prices in 
telecommunication, manufacturing and transport sector. We can say that when forecasting inflation a large model 
might be a clue, but if we remove redundant factors the inflation can be forecasted by using simple measures of real 
activity like the unemployment rate, industrial production or capacity utilization. While the first factor is easy to 
interpret, a complete understanding of the results requires an understanding of other factors as well. Unfortunately, 
their interpretation and role in explaining future changes in the consumer prices is an open question. 



D. Conclusions 



We will now recap this illustrative study with few comments: 

• In general both input and output data sets may represent highly complex correlation structure strongly interfered 
by redundant noisy factors. This significant amount of noise need to be carefully eliminated by performing initial 
decoupling of spatial correlations, so these large matrices become mutually free. 

• This is again precisely the case when FRV approach "takes the stage" and reduces the solution to few lines. 

• The procedure tested on real data within the case of unshifted variables hasn't show any significant improvement 
in comparison to standard factor analysis known in econometric literature for similar data sets(32j. For data 
lagged by one observation we have however recovered the sea of different non-trivial relations, and it might be 
interesting to compare these results from a more general perspective of factor models, however no implicitly 
close approach was found in the literature. 
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